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VIDEO ENCODING WITH SKIPPING MOTION ESTIMATION 
1 FOR SELECTED MACROBLOCKS 

2 

3 The invention relates to video encoders and in 

particular to reducing the computational complexity 
5 when encoding video. 
6 

7 Video encoders and decoders (CODECs) based on video 

8 encoding standards such as H263 and MPEG-4 are well 

9 known in the art of video compression. 



10 



11 The development of these standards has led to the 

12 ability to send video over much smaller bandwidths 

13 with only a minor reduction in quality. However, 

14 decoding and, more specifically, encoding, requires 

15 a significant amount of computational processing 

16 resources. For mobile devices, such as personal 

17 digital assistants (PDA 1 s) or mobile telephones, 

18 power usage is closely related to processor 

19 utilisation and therefore relates to the life of the 

20 battery charge. It is obviously desirable to reduce 

21 the amount of processing in mobile devices to 
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1 increase the operable time of the device for each 

2 battery charge. In general -purpose personal 

3 computers, CODECS must share processing resources 

4 with other applications. This has contributed to the 

5 drive to reduce processing utilisation, and 

6 therefore power drain, without compromising viewing 

7 quality. 
8 

9 In many video applications, such as tele- 

10 conferences, the majority of the area captured by 

11 the camera is static. In these cases, power 

12 resources or processor resources are being used 

13 unnecessarily to encode areas which have not changed 

14 significantly from a reference video frame. 
15 

16 The typical steps required to process the pictures 

17 in a video by an encoder such as one that is H2 63 or 

18 MPEG-4 Simple Profile compatible, are described as 

19 an example. 
20 

21 The first step requires that reference pictures be 

22 selected for the current picture. These reference 

23 pictures are divided into non-overlapping 

24 macroblocks. Each macroblock comprises four 

25 luminance blocks and two chrominance blocks, each 

26 block comprising 8 pixels by 8 pixels. 
27 

2 8 It is well known that the steps in the encoding 
29 .. ... : process that typically require the greatest 

3 0 computational time are the motion estimation, the 

31 forward discrete cosine transform (FDCT) and the 

32 inverse discrete cosine transform (IDCT) . 
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1 

2 The motion estimation step looks for similarities 

3 between the current picture and one or more 
reference pictures. For each macroblock in the 
current picture, a search is carried out to identify 
a prediction macroblock in the reference picture 

7 ( which best matches the current macroblock in the 

8 current picture. The prediction macroblock is 

9 identified by a motion vector (MV) which indicates a 

10 distance offset from the current macroblock. The 

11 prediction macroblock is then subtracted from the 

12 current macroblock to form a prediction error (pe) 

13 macroblock. This PE macroblock is then discrete 

14 cosine transformed, which transforms an image from 

15 the spatial domain to the frequency domain and 

16 outputs a matrix of coefficients relating to the 

17 spectral sub-bands. For most pictures much of the 

18 signal energy is at low frequencies, which is what 

19 the human eye is most sensitive to. The formed DCT 

2 0 matrix is then quantized which involves dividing the 

21 DCT coefficients by a quantizer value and then 

22 rounding to the nearest integer. This has the effect 

23 of reducing many of the higher frequency 

24 coefficients to zeros and is the step that will 

25 cause distortion to the image. Typically, the higher 

2 6 the quantizer step size, the poorer the quality of 

27 the image. The values from the matrix after the 

28 quantizer step are then re-ordered by "zigzag" 

29 scanning. This involves reading the values from the 

3 0 top left-hand corner of the matrix diagonally back 

31 and forward down to the bottom right-hand corner of 

32 the matrix. This tends to group the zeros together 
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which allows the stream to be efficiently run- level 
encoded (RLE) before eventually being converted into 
a bitstream by entropy encoding. Other "header" data 



4 is usually added at this point 
5 

6 If the MV is equal to zero and the quantized DCT 

7 coefficients are all equal to zero then there is no 

8 need to include encoded data for the macroblock in 

9 the encoded bitstream. Instead, header information 

10 is included to indicate that the macroblock has been 

11 "skipped". 
12 

13 US 6,192,148 discloses a method for predicting 

14 whether a macroblock should be skipped prior to the 

15 DCT steps of the encoding process. This method 

16 decides whether to complete the steps after the 

17 motion estimation if the MV has been returned as 

18 zero, the mean absolute difference of the luminance 

19 values of the macroblock is less than a first 

20 threshold and the mean absolute difference of the 

21 chrominance values of the macroblock is less than a 

22 second threshold. 
23 

24 For the total encoding process the motion estimation 

25 and the FDCT and IDCT are typically the most 

26 processor intensive. The prior art only predicts 

27 skipped blocks after the step of motion estimation 

28 and therefore still contains a step in the process 

29 . ? ha - t be considered processor intensive. 
30 
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1 The present invention discloses a method to predict 

2 skipped macroblocks that requires no motion 

3 estimation or DCT steps. 
4 

5 According to the present invention there is provided 

6 a method of encoding video pictures comprising the 

7 steps of: 

8 dividing the picture into regions ; 

9 predicting whether each region requires 

10 processing through further steps, said predicting 

11 step comprising comparing one or more statistical 

12 measures with one or more threshold values for each 

13 region. 
14 

15 Hence, the invention avoids unnecessary use of 

16 resources by avoiding processor intensive operations 

17 where possible. 
18 

19 The further steps preferably include motion 

20 estimation and/ or transform processing steps. 
21 

22 Preferably the transform processing step is a 

23 discrete cosine transform processing step. 
24 

25 A region is preferably a non- overlapping macroblock. 
26 

27 A macroblock is preferably a sixteen by sixteen 

28 matrix of pixels. 
29 

3 0 Preferably, one of the statistical measures is 

31 whether an estimate of the energy of some or all 

32 pixel values of the macroblock, optionally divided 
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1 by the quantizer step size, is less than a 

2 predetermined threshold value. 
3 

4 Alternatively or further preferably, one of the 

5 statistical measures is whether an estimate of the 

6 values of certain discrete cosine transform 

7 coefficients for one or more sub-blocks of the 

8 macroblock, is less than a second threshold value. 
9 

10 Alternatively, one of the statistical measures is 

11 whether an estimate of the distortion due to 

12 skipping the macroblock is less than a predetermined 

13 threshold value. 
14 

15 Preferably, the estimate of distortion is calculated 

16 by deriving one or more statistical measures from 

17 some or all pixel values of one or more previously 

18 coded macroblocks with respect to the macroblock. 
19 

2 0 The estimate of distortion may be calculated by 

21 subtracting an estimate of the sum of absolute 

22 differences of luminance values of a coded 

23 macroblock with respect to a previously coded 

24 macroblock (SAE no3kip ) from the sum of absolute 

25 differences of luminance values of a skipped 

26 macroblock with respect to a previously coded 
2 7 macroblock (SAE skip ) . 

28 

2 9 SAE noskip may be estimated by a constant value K or, 

3 0 in a more accurate method, by the sum of absolute 
31 differences of luminance values of a previously 
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1 coded macroblock and if there is no previously coded 

2 macroblock by a constant value K. 
3 

4 Further preferably, the method of encoding pictures 

5 may be performed by a computer program embodied on a 

6 computer usable medium. 
7 

8 Further preferably, the method of encoding pictures 

9 may be performed by electronic circuitry. 
10 

11 The estimate of the values of certain discrete 

12 cosine transform coefficients may involve: 

13 dividing the sub-blocks into four equal regions; 

14 calculating the sum of absolute differences of the 

15 residual pixel values for each region of the sub- 

16 block, where the residual pixel value is a 

17 corresponding reference (previously coded) pixel 

18 luminance value subtracted from the current pixel 

19 luminance value; 

2 0 estimating the low frequency discrete cosine 

21 transform coefficients for each region of the sub- 

22 blocks, such that: 

Y Ql =abs(A + C~B-D) 

23 7 10 =abs(A + B-C-D) 

Y n =abs(A + D-B-C) 

24 where Y QX , y 10 and Y 1± represent the estimations 

25 of three low frequency discrete cosine transform 

26 coefficients and A, B, C and D represent the sum of 
2 7 absolute differences of each of the regions of the 

2 8 sub-block where A is the top left hand corner, B is 
29 the top right hand corner, C is the bottom left hand 

3 0 corner and D is the bottom right hand corner; and 
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1 selecting the maximum value of the estimate of 

2 the discrete cosine transform coefficients from all 

3 the estimates calculated. 
4 

5 It should be appreciated that, in the art, referring 

6 to pixel values refers to any of the three 

7 components that make up a colour pixel, namely, a 

8 luminance value and two chrominance values. In some 

9 instances, "sample" value is used instead of pixel 

10 value to refer to one of the three component values 

11 and this should be considered interchangeable with 

12 pixel value. 
13 

14 It also should be appreciated that a macroblock can 

15 be any region of pixels, of a particular size, 

16 within the frame of interest. 
17 

18 The invention will now be described, by way of 

19 example, with reference to the figures of the 

20 drawings in which: 
21 

22 Figure 1 shows a flow diagram of a video picture 

23 encoding process. 
24 

25 Figure 2 shows a flow diagram of a macroblock 

26 encoding process 
27 

28 Figure 3 shows a flow diagram of a prediction 

2 9 decision process 

30 

31 Figure 4 shows a flow diagram of an alternative 

32 prediction decision process 
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1 

2 With reference to Figure 1, a first step 102 reads a 

3 picture frame in a video sequence and divides it 

4 into non-overlapping macroblocks (MBs) . Each MB 

5 comprises four luminance blocks and two chrominance 

6 blocks, each block comprising 8 pixels by 8 pixels. 

7 Step 104 encodes the MB as shown in Figure 2. 
8 

9 With reference to Figure 2, a MB encoding process is 

10 shown 104, where a decision step 202 is performed 

11 before any other step. 
12 

13 The current H263 encoding process currently teaches 

14 that each MB in the video encoding process typically 

15 goes through the steps 2 04 to 226 or equivalent 

16 processes, in the order shown in Figure 2 or in a 

17 different order. Motion estimation step 2 04 

18 identifies one or more prediction MB(s) each of 

19 which is defined by a MV indicating a distance 

20 offset from the current MB and a selection of a 

21 reference picture. Motion compensation step 206 

22 subtracts the prediction MB from the current MB to 

23 form a Prediction Error (PE) MB. If the value of MV 

24 requires to be encoded (step 208), then MV is 

25 entropy encoded (step 210) optionally with reference 

26 to a predicted MV. 
27 

28 Each block of the PE MB is then forward discrete 

2 9 cosine transformed (FDCT) 212 which outputs a block 

3 0 of coefficients representing the spectral sub-bands 

31 of each of the PE blocks. The coefficients of the 

32 FDCT block are then quantized (for example through 
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1 division by a quantizer step size) 214 and then 

2 rounded to the nearest integer. This has the effect 

3 of reducing many of the coefficients to zero. If 

4 there are any non-zero quantized coefficients 

5 (Qcoeff) 216 then the resulting block is entropy 

6 encoded by steps 218 to 222. 
7 

8 In order to form a reconstructed picture for further 

9 predictions, the quantized coefficients (QCoeff) are 

10 re-scaled (for example by multiplication by a 

11 quantizer step size) 224 and transformed with an 

12 inverse discrete cosine transform (IDCT) 226. After 

13 the IDCT the reconstructed PE MB is added to the 

14 reference MB and stored for further prediction. 
15 

16 The decision step 228 looks at the output of the 

17 prior processes and if the MV is equal to zero and 

18 all the Qcoeffs are zero then the encoded 

19 information is not written to the bitstream but a 
2 0 skip MB indication is written instead. This means 

21 that all the processing time that has been used to 

22 encode the MB has not been necessary because the MB 

23 is regarded as similar to or the same as the 

24 previous MB. 
25 

2 6 As one embodiment of the invention, in Figure 2 

27 decision step 202 predicts whether the current MB is 

23 likely to be skipped, that is that after the process 

29 steps 202 - 226, the MB is not coded but a skip 

3 0 indication is written instead. If the Decision step 

31 2 02 does predict that the MB would be skipped the MB 

32 is not passed on to the step 204 and the following 
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1 process steps but skip information is passed 

2 directly to step 232. 
3 

4 with reference to Figure 3, a flow diagram is shown 

5 of the decision to skip the MB 202. 

6 MBs that are skipped have zero MV and QCoef f . Both 

7 of these conditions are likely to be met if there is 

8 a strong similarity between the current MB and the 

9 same MB position in the reference frame. The energy 

10 of a residual MB formed by subtracting the reference 

11 MB, without motion compensation, from the current MB 

12 is approximated by the sum of absolute differences 

13 for the luminance part of the MB with zero 

14 displacement (SADOmb) given by: 

15 SAD0 MB =f j f,\C c {iJ)-C P UJ)\ Equation 1 

1=0 j=o 

16 CcO'.y) and C p (i,j) are luminance samples from an MB 

17 in the current frame and in the same position in 

18 the reference frame, respectively. 
19 

20 The relationship between SADOmb and the probability 

21 that the MB will be skipped also depends on the 

22 quantizer step size , since a higher step size 

23 typically results in an increased proportion of 

24 skipped MBs. 
25 

26 A comparison of the calculation SADOmb (optionally 

27 divided by the quantizer step size (Q) ) 302 to a 
28.... first threshold value gives a first comparison - step 

29 304. If the calculated value is greater than a first 

30 threshold value then the MB is passed to step 204 

31 and enters a normal encoding process. If the 
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1 calculated value is less than a first threshold 

2 value then a second calculation is performed 3 06. 
3 

4 Step 3 06 performs additional calculations on the 

5 residual MB. Each 8x8 luminance block is divided 

6 into four 4x4 blocks. A, B, C and D (Equation 2) are 

7 the SAD values of each 4x4 block and R(i, j) are the 

8 residual pixel values without motion compensation. 
9 

10 A =± ±\R(iJ)\ B = ± ±\R(i,J)\ 



11 
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17 
18 
19 
20 



'=0 J=0 /=0 y=3 



Equation 2 



/=4 y=o /=4 J=4 



14 Y 01 , Y 10 and Yn (Equation 3) provide a low- complexity 

15 • estimate of the magnitudes of the three low 

16 frequency DCT coefficients coeff(0,l), coeff(i f o) 
and coeff(l,l) respectively. if any of these 
coefficients is large then there is a high 
probability that the MB should not be skipped. 
Y4x4 b iook (Equation 4) is therefore used to predict 

21 whether each block may be skipped. The maximum for 

22 the luminance part of a macroblock is calculated 

23 using Equation 5. 
24 

25 7 0] =abs(A + C-B-D) Y l0 = abs(A + B-C-D) 

26 Y u =abs(A + D-B-C) 

27" Equation 3 



WO 2004/056125 



PCT/GB2003/005526 



13 

1 Y4x4 block =MAX(Y 0] ,Y 10 J n ) 

2 Equation 4 
3 

4 F4x4 max =M4Z(74x4 Woctl5 r4x4 Woc , 2 ,74x4 Woc ,3,F4x4 WocM ) 

5 Equation 5 
6 

7 The calculated value of Y4x4 max is compared with a 

8 second threshold 3 08. If the calculated value is 

9 less than a second threshold then the MB is skipped 

10 and the next step in the process is 232. If the 

11 calculated value is greater than a second threshold 

12 then the MB is passed to step 204 and the subsequent 

13 steps for encoding. 
14 

15 These steps typically have very little impact on 

16 computational complexity. SADOmb is normally computed 

17 in the first step of any motion estimation algorithm 

18 and so there is no extra calculation required. 

19 Furthermore, the SAD values of each 4x4 block (A, B, 
2 0 C and D in Equation 2) may be calculated without 

21 penalty if SADOmb is calculated by adding together 

22 the values of SAD for each 4x4 -sample sub -block in 

23 the MB. 
24 

25 The additional computational requirements of the 

26 classification algorithm are the operations in 

27 Equations 3, 4 and 5 and these are typically not 

28 computationally intensive. 

29 

30 With reference to Figure 4, a flow diagram is shown 

31 in which a further embodiment of the decision to 

32 skip the MB 202 is described. 



WO 2004/056125 



PCT/GB2003/005526 



14 



1 



2 In the previous embodiment (Fig. 3), the decision to 

3 skip the MB 202 was based on the luminance of the 

4 current MB compared to the reference MB. m the 

5 present embodiment, the decision to skip the MB 202 

6 is based on the estimated distortion that would be 

7 caused due to skipping the MB. 
8 

9 When a decoder decodes a MB, the coded residual data 

10 is decoded and added to mot ion -compensated reference 

11 frame samples to produce a decoded MB. The 
distortion of a decoded MB relative to the original, 
uncompressed MB data can be approximated by Mean 

14 Squared Error (MSB) . MSB for the luminance samples 

15 a i3 of a decoded MB, compared with the original 

16 luminance samples b±j , is given by: 
17 



12 
13 



18 AU i,J 



19 Equation 6 

20 



21 
22 



Define MSEnog^p as the luminance MSE for a macroblock 
that is coded and transmitted and define MSE 3kip as 

23 the luminance MSE for a MB that is skipped (not 

24 coded) . When a MB is skipped, the MB data in the 

25 same position in the reference frame is inserted in 

26 that position by the decoder. For a particular MB 

27 position, an encoder may choose to code the MB or to 

28 skip if. • The difference in distortion, MSE di i f ; - 

29 between skipping or coding the MB is defined as: 
30 
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1 MSE diff = MSE skip - MSE noskip 

2 Equation 7 

3 

4 If MSEdiff is zero or has a low value, then there is 

5 little or no "benefit" in coding the MB since a very 

6 similar reconstructed result will be obtained if the 

7 MB is skipped. A low value of MSE diff will include 

8 MBs with a low value of MSE skip where the MB in the 

9 same position in the reference frame is a good match 

10 for the current MB. A low value of MSE diff will also 

11 include MBs with a high value of MSE noskip where the 

12 decoded, reconstructed MB is significantly different 

13 from the original due to quantization distortion. 
14 

15 The purpose of selectively skipping MBs is to save 

16 computation. MSE is not typically calculated in an 

17 encoder and so an additional computational cost 

18 would be required to calculate Equation 7. Sum of 

19 Absolute Errors (SAE) for the luminance samples of a . 

20 decoded MB is given by: 
21 

22 U 

23 Equation 8 

24 

25 SAE is approximately monotonically increasing with 

26 MSE and so is a suitable alternative measure of 

27 distortion to MSE. Therefore SAEdiff is used, the 

28 difference in SAE between a skipped MB and a coded 

29 MB, as an estimate of the increase in distortion due 

30 to skipping a MB: 
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1 

2 SAE dW =SAE s1dp -SAE noskip 

3 Equation 9 

4 

5 SAE skip is the sum of absolute errors between the 

6 uncoded MB and the luminance data in the same 

7 position in the reference frame. This is typically 

8 calculated as the first step of a motion estimation 

9 algorithm in the encoder and is usually termed SAE 00 . 

10 Therefore, SAE skip is readily available at an early 

11 stage of processing of each MB. 
12 

13 SAEnoskip is the SAE of a decoded MB, compared with 

14 the original uncoded MB, and is not normally 

15 calculated during coding or decoding. Furthermore, 

16 SAE noakip cannot be calculated if the MB is actually 

17 skipped. A model for SAE noskip is therefore required 

18 in order to calculate Equation 9. 
19 

20 A first model is as follows: 
21 

22 SAE noskip = K (where K is a constant) . 
23 

24 Which follows that SAE diff is calculated as: 
25 

26 SAE diff =SAE skip -K 

27 Equation 10 

28 

29 This model is computationally simple but is unlikely 

3 0 to be accurate because there are many MBs that do 

31 not fit a simple linear trend. 
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• 1 

2 An alternative model is as follows: 
3 

4 SAE noS kip(i,n) = SAE noskip (i , n-1) 

5 Where i is the current MB number, n is the current 

6 frame and n-1 is the previous coded frame. 
7 

8 This model requires the encoder to compute SAE noskip/ 

9 a single calculation of Equation 8 for each coded 

10 MB, but provides a more accurate estimate of SAE noskip 

11 for the current MB. If -MB (i, n-1) is a MB that was 

12 skipped, then SAEnoskip (i, n-1) cannot be calculated 

13 and it is necessary to revert to first model. 
14 

15 Based on Equation 9 and using the models described 

16 above, two algorithms for selectively skipping and 

17 therefore not processing MBs are as follows: 
18 

19 Algorithm (1) : 

20 if (SAEoo - K) < T 

21 skip current MB 

22 else 

23 code current MB 
24 

25 Algorithm (1) uses a simple approximation for 

2 6 SAEnoskip but is straightforward to implement. 

27 

28 Algorithm (2) : 

29 if (MB (i, n-1) has been coded) 

30 SAE nogkip { estimate} = SAE noskip ( i , n-1) 

31 else 

32 SAEnoskip {estimate} = K 
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1 if (SAEoo - SAE nos ki P { estimate}) < T 

2 skip current MB 

3 else 

4 code current MB 
5 

6 Algorithm (2) provides a more accurate estimate of 

7 SAE n oskip but requires calculation and storage of 

8 SAE noskip after coding of each non- skipped MB. In both 

9 algorithms, a threshold parameter T controls the 

10 proportion of skipped MBs . A higher value of T 

11 should result in an increased number of skipped MBs 

12 but also in an increased distortion due to 

13 incorrectly skipped MBs. 
14 

15 Improvements and modifications to the method of 

16 prediction may be incorporated in the foregoing 

17 without departing from the scope of the present 

18 invention. 
19 

20 For example, SAE noskip could be estimated by a 

21 combination or even a weighted combination of the 

22 sum of absolute differences of luminance values of 

23 one or more previously coded macroblocks . in 

24 addition, SAE noskip could be estimated by another 

25 statistical measure such as sum of squared errors or 

26 variance. 



